04. Fixed Q-Targets
Fixed Q-Targets
Fixed Q-Targets
## Summary
In Q-Learning, we update a guess with a guess , and this can potentially lead to harmful correlations. To avoid this, we can update the parameters w in the network \hat{q} to better approximate the action value corresponding to state S and action A with the following update rule:
where w^- are the weights of a separate target network that are not changed during the learning step, and ( S , A , R , S' ) is an experience tuple.
Note : Ever wondered how the example in the video would look in real life? See: Carrot Stick Riding .
## Quiz
SOLUTION:
- The Deep Q-Learning algorithm uses two separate networks with identical architectures.
- The target Q-Network's weights are updated less often (or more slowly) than the primary Q-Network.
- Without fixed Q-targets, we would encounter a harmful form of correlation, whereby we shift the parameters of the network based on a constantly moving target.